Skip to content

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Dec 2, 2025

Description

This PR migrates from our custom type checking mechanism to Apache Calcite's native validation system by introducing a SqlNode validation layer. This addresses the lack of a proper SQL validation phase and resolves compatibility issues with user-defined types (UDTs).

This implements Approach 1 described in #3865.

How It Works

The PR introduces a validation layer in the query execution pipeline:

RelNode → SqlNode → [Validation & Type Coercion] → SqlNode → RelNode

This approach leverages Calcite's robust type system while preserving OpenSearch's custom type semantics through careful type mapping and restoration.

Benefits

  • Comprehensive type checking: Access to Calcite's full type checker suite (variable-length operands, same-operand constraints, OR-relationship type checkers) vs. our limited family-based checking
  • Function optimization: Ability to rewrite functions at the SqlNode level (e.g., sqrt(x)pow(x, 0.5)) before generating physical plans
  • Automatic type coercion: Implicit casting where type-safe and appropriate, improving query compatibility
  • Reduced maintenance: Leverage Calcite's battle-tested validation logic instead of maintaining custom implementations

Work Items / Implementation Details

Core Infrastructure

  • Introduced SqlNode validation layer with round-trip conversion (RelNode ↔ SqlNode)
  • Extended Calcite validator to support OpenSearch UDTs via type substitution
  • Implemented custom deriveType, coerceOperandType, and type inference methods
  • Made validator thread-safe with proper synchronization

Type System Enhancements

  • UDT Support: To support OpenSearch-specific UDTs (which Calcite doesn't natively support), we bridge Calcite's SqlTypeName enum with OpenSearch UDTs through dynamic type mapping:
    1. Convert UDTs to standard SQL types (e.g., EXPR_TIMESqlTypeName.TIME) before validation
    2. Apply Calcite's validation and type coercion
    3. Restore UDTs after validation to maintain plan correctness
  • Cross-type Comparisons: Override commonTypeForBinaryComparison to enable datetime cross-type comparisons (DATE vs TIME, etc.)
  • IP Type Handling: Emulated IP type using standard SQL types for validation, with special handling in operators
  • Composite Types: Extended SQL-UDT conversion to handle nested/composite types
  • Type Coercion: Use SAFE_CAST for string-to-number conversions to tolerate malformatted data; use CAST for literal numbers

Function & Operator Handling

  • Defined operand type checkers for all built-in functions:
    • Array functions (array_slice, reduce, mvappend, etc.) with proper type inference
    • JSON functions (json_extract, json_set, etc.)
    • Mathematical functions (corrected atan overloading, percentile approximations)
    • Aggregation functions (DISTINCT_COUNT_APPROX, COUNT(*) rewriting, etc.)
  • Function overloading support:
    • ADD operator: String concatenation vs numeric addition
    • ATAN: Single-operand vs two-operand versions
    • GEOIP: String overrides due to UDT erasure
  • Arithmetic operations between strings and numerics (implicit coercion)
  • Define SqlKind for DIVIDE and MOD UDFs

Query Construct Support

  • Preserve null ordering and collation through SqlNode round-trips
  • Support SEMI and ANTI joins in SQL conversion
  • Preserve sort orders in subqueries
  • Pass RelHint through conversions (added SqlHint for LogicalAggregate)
  • Handle windowed aggregates and bucket_nullable flags
  • Support IN/NOT IN with tuple inputs via row constructor rewriting

Dialect & Compatibility

  • Extended OpenSearch Spark SQL dialect for custom syntax
  • Fixed interval semantics mismatch between SQL and PPL
  • Fixed quarter interval bug in Calcite
  • Handle identifier expansion in aggregate functions
  • Properly handle LogicalValues for empty row generation

Edge Cases & Fixes

  • Skip validation for unsupported patterns:
    • Bin-on-timestamp operations (not yet implemented)
    • Group-by window functions (return original plan gracefully)
    • Specific aggregation patterns with LogicalValues
  • Fixed nullability attribute preservation through SAFE_CAST
  • Trim unused fields after SQL→Rel conversion
  • Ensure RelToSqlConverter is instantiated per-use (stateful component)
  • Fixed float literal handling with explicit casts
  • Remove JSON_TYPE operator insertion where inappropriate

Test Fixes

  • Fixed integration tests across multiple suites:
    • CalcitePPLDateTimeBuiltinFunctionIT: Interval semantics
    • CalcitePPLBuiltinFunctionIT: LOG function, sarg deserialization
    • CalciteArrayFunctionIT: Type checkers, reduce function inference
    • CalciteMathematicalFunctionIT, CalcitePPLAggregationIT
    • CalciteBinCommandIT: Timestamp operations, windowed aggregates in GROUP BY
    • CalciteStreamstatsCommandIT: Sort columns, bucket_nullable
    • CalcitePPLJsonBuiltinFunctionIT: String conversion
  • Updated all explain integration tests for plan changes
  • Fixed YAML tests, doctests, and unit tests
  • Updated ClickBench query plans

Code Cleanup

  • Removed legacy type checking from PPLFuncImpTable
  • Deprecated UDFOperandMetadata.wrapUDT interface
  • Removed unused classes: CalciteFuncSignature, PPLTypeChecker
  • Consolidated EnhancedCoalesce into built-in Coalesce
  • Removed type checkers from operator registration (now handled by Calcite)

Optimizations

  • Eliminated SAFE_CAST on non-string literal numbers (use CAST for better performance)
  • Investigated and addressed dedup optimization issues

Performance Impact

Dataset Scenario Analyze (ms) Optimize (ms) Execute (ms) Total Avg (ms)
clickbench Without Validation 7.93 8.29 0.95 17.53
With Validation 9.59 8.72 1.00 19.69
Difference +1.66 (+20.9%) +0.43 (+5.2%) +0.05 (+5.3%) +2.16 (+12.3%)
big5 Without Validation 3.52 4.27 0.80 8.99
With Validation 3.37 3.80 0.72 8.22
Difference -0.15 (-4.3%) -0.47 (-11.0%) -0.08 (-10.0%) -0.77 (-8.6%)
tpch Without Validation 8.65 692.00 183.15 1028.80
With Validation 7.91 689.16 175.63 1008.77
Difference -0.74 (-8.6%) -2.84 (-0.4%) -7.52 (-4.1%) -20.03 (-1.9%)

Profiled on personal laptop, each test runs twice, then averaged.

Conclusion: No significant performance degradation. ClickBench shows slight overhead (+12.3%) during analyze phase due to validation, but big5 and TPCH show improvements, likely from better query optimization enabled by proper type information.

Related Issues

Resolves #4636, resolves #3865

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-project opensearch-project deleted a comment from coderabbitai bot Dec 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Improved type coercion and implicit casting (numeric, string, datetime, IP) for more flexible queries.
    • Added support for LEFT SEMI/ANTI joins and enhanced window functions with inherited ordering.
    • Expanded function operand validation (arrays, maps, JSON, math, IP) and variadic support.
    • More robust validation pipeline and SQL translation; richer, YAML-based explain outputs.
  • Bug Fixes

    • Corrected IP comparisons, empty-string checks, and date/time range handling.
    • Stabilized binning and mixed-type predicates (IN/BETWEEN).
  • Documentation

    • Updated examples, outputs, and formatting; marked some samples as non-executable.
  • Tests

    • Switched explain expectations to YAML; added/adjusted integration scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds a Calcite-based validation and implicit type coercion pipeline for PPL by round-tripping RelNode↔SqlNode. Introduces PPL-specific validator, type coercion/rules, convertlet table, dialect updates, converters/shuttles, and utilities. Refactors type utilities/factory, operand checkers, and UDF metadata. Simplifies function registries; removes legacy type checker/coercion utils. Updates extensive tests/resources and docs.

Changes

Cohort / File(s) Summary / Attention
Validation & Coercion Core
core/.../validate/PplValidator.java, core/.../validate/PplTypeCoercion.java, core/.../validate/PplTypeCoercionRule.java, core/.../validate/PplConvertletTable.java, core/.../validate/OpenSearchSparkSqlDialect.java, core/.../validate/SqlOperatorTableProvider.java, core/.../validate/ValidationUtils.java
New PPL validator, coercion rules/impl, convertlets (IP ops, ATAN), dialect cast/conformance, operator table provider, and validation utilities. High attention: correctness of type mapping UDT↔SQL, coercion paths, and liberal conformance behavior.
Rel/SQL Conversion & Shuttles
core/.../validate/converters/PplRelToSqlNodeConverter.java, core/.../validate/converters/PplSqlToRelConverter.java, core/.../validate/shuttles/PplRelToSqlRelShuttle.java, core/.../validate/shuttles/SkipRelValidationShuttle.java, core/.../validate/shuttles/SqlRewriteShuttle.java
Adds converters and shuttles for validation round-trip, semi/anti join handling, hints propagation, literal normalization, and selective skip logic. Review control-flow for joins, EXISTS/IN, and rex rewrites.
Planner Context & Visitors
core/.../calcite/CalcitePlanContext.java, core/.../calcite/CalciteRelNodeVisitor.java, core/.../calcite/CalciteRexNodeVisitor.java, core/.../calcite/utils/CalciteToolsHelper.java
Context gains lazy SqlValidator and operator table provider; embeds collations into window OVER; simplifies window function building; exposes preparing stmt; WIDTH_BUCKET error handling tweak. Validate thread-safety and correctness of collation injection.
Type System Refactor
core/.../calcite/utils/OpenSearchTypeFactory.java, core/.../calcite/utils/OpenSearchTypeUtil.java, core/.../calcite/ExtendedRexBuilder.java, core/.../calcite/utils/binning/BinnableField.java, core/.../calcite/utils/PlanUtils.java
New OpenSearchTypeUtil; factory leastRestrictive override and UDT handling; rex return-type derivation for numeric/char arithmetic; replace legacy type checks with new util. High attention: leastRestrictive correctness and regression in UDT paths.
Executor Integration
core/.../executor/QueryService.java
Inserts validation pipeline (Rel→Sql→validate→Rel) before optimization; tolerates known Calcite issues; integrates new converters/shuttles. Review error propagation and skip conditions.
Operand Types & UDF Metadata
core/.../calcite/utils/PPLOperandTypes.java, core/.../expression/function/UDFOperandMetadata.java
Replaces composite/wrapper checkers with direct OperandTypes; adds SCALAR checkers; simplifies UDF metadata wrap API. Examine operand arities and signatures for regressions.
Function Registry & Removal
core/.../expression/function/PPLFuncImpTable.java, core/.../expression/function/PPLBuiltinOperators.java, core/.../expression/function/CalciteFuncSignature.java (removed), core/.../expression/function/CoercionUtils.java (removed), core/.../expression/function/PPLTypeChecker.java (removed)
Moves to single implementation per function/agg; changes many operators to SqlFunction; removes custom type-check/coercion infra. High attention: resolution behavior changes and runtime selection (e.g., ADD CONCAT vs PLUS).
Collection/JSON/IP UDFs
core/.../CollectionUDF/*, core/.../jsonUDF/*, core/.../udf/ip/*, core/.../udf/binning/WidthBucketFunction.java, core/.../udf/math/*, core/.../LambdaUtils.java
Define operand metadata via OperandTypes or custom checkers; refine return type inference; remove lambda re-inference helpers; minor behavior adjustments (e.g., time/numeric checks via new util). Verify signature enforcement and inference.
Tests: Engine/Explain/Plans
integ-test/src/test/java/... (many), integ-test/src/test/resources/expectedOutput/calcite/* (many), core/src/test/...
Switch to YAML explains; add alternatives; adjust schemas/types, SAFE_CAST/script pushdowns, plan ordering; add helper for multiple expected YAMLs; remove outdated tests. Validate stability across environments.
API/Test Adjustments
api/.../UnifiedQueryTranspilerTest.java
Dialect import path change and expected SQL update for coercion behavior.
Docs
docs/user/ppl/*
Replace executable blocks with ignore, update examples and expected outputs formatting. Low risk.

Sequence Diagram(s)

sequenceDiagram
  participant P as QueryService
  participant R as RelNode (PPL plan)
  participant RS as PplRelToSqlNodeConverter
  participant V as PplValidator (Calcite)
  participant SR as PplSqlToRelConverter
  participant O as Optimizer

  P->>P: executeWithCalcite(...)
  P->>R: build initial plan
  P->>RS: Rel→Sql conversion (dialect-aware)
  RS-->>P: SqlNode
  P->>V: validate(SqlNode) with PplTypeCoercion/Convertlets
  V-->>P: validated SqlNode (with implicit casts)
  P->>SR: Sql→Rel conversion (semi/anti joins fixup)
  SR-->>P: validated RelNode
  P->>O: optimize(validated RelNode)
  O-->>P: optimized plan
  P-->>P: execute
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • ykmr1224
  • mengweieric
  • dai-chen
  • LantaoJin
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title concisely and accurately summarizes the primary change: enabling Calcite validation for type checking and coercion.
Description check ✅ Passed Description is detailed and directly related to the changeset, explaining the validation layer, UDT handling, coercion rules, and test updates.
Linked Issues check ✅ Passed Implements linked-issue goals: SqlNode validation round-trip, Calcite-based type coercion, UDT<->SQL mapping, and validator/coercion classes per #4636 and #3865.
Out of Scope Changes check ✅ Passed All substantive changes align with stated objectives (validation/coercion, UDT support, test/doc updates); no unrelated feature or module changes observed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuancu yuancu added the enhancement New feature or request label Dec 3, 2025
@yuancu yuancu force-pushed the issues/4636 branch 5 times, most recently from e085f81 to fc6dd27 Compare December 15, 2025 13:40
…checking

Signed-off-by: Yuanchun Shen <[email protected]>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/executor/QueryService.java

# Conflicts:
#	core/src/main/java/org/opensearch/sql/executor/QueryService.java
- 2 more ITs passed in PPLBuiltinFunctionIT

Signed-off-by: Yuanchun Shen <[email protected]>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- this fix testRand, where desrialization of sarg does not restore its type
- todo: update the toRex in ExtendedRelJson to the align with the latest version

Signed-off-by: Yuanchun Shen <[email protected]>
…estamp; (time, timestamp) -> timestamp (1240/1599)

Signed-off-by: Yuanchun Shen <[email protected]>
- allow type cast
- rewrite call to sql compare to custom ip comapre

Signed-off-by: Yuanchun Shen <[email protected]>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
…d in mvindex's implementation (1580/2015)

Signed-off-by: Yuanchun Shen <[email protected]>
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `COMM`, COALESCE(`COMM`, `EMPNO`, 'fallback') `result`\n"
"SELECT `EMPNO`, `COMM`, COALESCE(`COMM`, `EMPNO`) `result`\n"
Copy link
Collaborator Author

@yuancu yuancu Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to check: is this optimized because EMPNO is nonnull?

the rest optimization seems reasonable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml (1)

10-11: Physical plan uses SCRIPT-based pushdown with serialized expressions.

The physical plan now uses script-based pushdowns containing base64-encoded RexNode expressions for the SAFE_CAST comparisons. While this aligns with the PR's architecture changes, the opaque base64 payloads reduce the readability and maintainability of this expected output file.

Consider documenting (in a comment or companion file) what these serialized scripts represent, or providing a utility to decode them for debugging purposes. This would help future maintainers verify correctness when the expected output changes.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml (1)

13-13: Recommend verifying SCRIPT pushdown correctness and performance.

The physical plan now uses SCRIPT-based pushdowns with serialized, base64-encoded operator trees. This raises several concerns:

  1. Maintainability: The encoded script payloads are not human-readable, making debugging pushdown issues difficult.

  2. Performance: OpenSearch script-based filters are typically slower than native filters. Three separate script clauses for SAFE_CAST operations may impact query performance, especially on large datasets.

  3. Correctness: The test expectation validates plan shape but not execution results. Ensure:

    • The actual test verifies query results match expected values
    • Edge cases (malformatted data, NULL handling) are tested
    • Timezone handling in EXPR_TIMESTAMP VARCHAR is correct (note the UTC "Z" suffix in the JSON range query)

Consider:

  1. Adding integration tests that validate query results, not just plan shape
  2. Benchmarking query performance compared to the previous FILTER-based approach
  3. Verifying the script execution produces correct results with representative data

Based on learnings, test SQL generation and optimization paths for Calcite integration changes.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q30.yaml (1)

1-9: Ensure test coverage for validation pipeline edge cases across the suite.

This test file correctly follows the expected YAML format and represents a realistic clickbench query with a high field count (90 fields). As per the coding guidelines for test resources, verify that the broader test suite covers edge cases and boundary conditions for the new Calcite validation pipeline, such as:

  • Queries with various field index patterns and ranges
  • Type coercion scenarios (UDTs, cross-datetime comparisons, SAFE_CAST)
  • Preservation of hints, sort orders, and null directions across RelNode↔SqlNode conversion
  • Operator overload resolution (IP comparator, ATAN, ADD ambiguity)

Based on learnings: Test SQL generation and optimization paths for Calcite integration changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yaml (1)

1-20: Consider additional test cases for edge cases.

This test validates the max=1 case with inner join and proper pushdown optimization. To improve test coverage for the max option feature, consider adding test cases for:

  1. Different max values (e.g., max=2, max=10)
  2. Join without max option to validate baseline behavior
  3. Edge case where right side has no matches (empty result)
  4. Comparing against the SEMI/ANTI join behavior where max has no effect (as noted in learnings)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb78ea7 and ff91bdc.

📒 Files selected for processing (71)
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q30.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_join4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_chart_multi_group_key.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_chart_single_group_key.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_dedup.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search_call.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
✅ Files skipped from review due to trivial changes (7)
  • integ-test/src/test/resources/expectedOutput/calcite/explain.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search_call.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_chart_single_group_key.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_chart_multi_group_key.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.yaml
🚧 Files skipped from review as they are similar to previous changes (27)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_join4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
🧰 Additional context used
📓 Path-based instructions (4)
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_dedup.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q30.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yaml
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_dedup.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q30.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)

penghuo
penghuo previously approved these changes Jan 5, 2026
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

* @return {@code true} if the exception should be tolerated as a known Calcite bug, {@code false}
* otherwise
*/
public static boolean tolerantValidationException(Exception e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many cases will hit this? It will use the original plan if it's tolerant, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly 5 cases. Yes, it will use the original plan if hit

private RelNode validate(RelNode relNode, CalcitePlanContext context) {
SkipRelValidationShuttle skipShuttle = new SkipRelValidationShuttle();
relNode.accept(skipShuttle);
if (skipShuttle.shouldSkipValidation()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a limitation? How many cases in our IT will hit this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are limitations, either from our implementation or calite's. Around a dozen ITs will hit this.

import org.apache.calcite.sql2rel.SqlToRelConverter;
import org.checkerframework.checker.nullness.qual.Nullable;

public class PplSqlToRelConverter extends SqlToRelConverter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've extended Calcite's SqlToRelConverter, we have chance to customized our plan by using hints.

e.g. When converting RelNode to SqlNode, we convert LogicalDedup to SqlSelect with dedup hints. And later when converting SqlNode to RelNode, we can detect that hint and add LogicalDedup on the original LogicalProject.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was also considering this approach. I hesitated because I was not sure which sql operation conforms with dedup's attributes. SqlSelect should to be a reasonable one.

SQL hints are already used to the empty bucket hints in the current implementation.

Copy link
Collaborator

@qianheng-aws qianheng-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how much overhead will this validation process introduce?

@qianheng-aws
Copy link
Collaborator

This PR validate the whole SQL including type compatible, SQL grammer, field access and others. However what we need is only type compatibility since the relbuilder will not build an incorrect plan.

I'm thinking if we can only convert recall to sqlnode and invoke validateCall to do that part of validation(should include type coercion), and then convert it back to recall.

@yuancu
Copy link
Collaborator Author

yuancu commented Jan 8, 2026

I'm thinking if we can only convert recall to sqlnode and invoke validateCall to do that part of validation(should include type coercion), and then convert it back to recall.

It's kind of tricky, the validation of these are deeply entangled. E.g. if a function call is at a very bottom level of the AST, the validator has to go down the AST and validate all the nodes it comes across to build a proper validation scope for the function call. If we want to try to build them for these function calls, we may need to replicate those steps, which does not differ much from the original validation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/user/ppl/interfaces/endpoint.md (1)

56-64: Fix duplicate "Description" section and add documentation for SAFE_CAST behavior in explain output.

The Explain section has a duplicate "### Description" heading at lines 58 and 61 that should be consolidated into a single introduction.

Additionally, while the examples show SAFE_CAST in the logical plan output (e.g., SAFE_CAST(30:BIGINT)), there is no explanation of what SAFE_CAST is, how it differs from CAST, or why it appears in the output. For users upgrading from previous versions and any automated tooling that parses explain output, clarification would be valuable:

  • Explain that SAFE_CAST is used in logical plans for type-safe casting with malformed string handling
  • Note that the physical plan shows the optimized form without SAFE_CAST (as seen in examples where logical shows SAFE_CAST(30:BIGINT) but physical shows >($5, 30))
  • Consider a brief note that this represents a change in output format for users parsing explain responses
🤖 Fix all issues with AI agents
In @core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java:
- Around line 129-168: The getValidator method in CalcitePlanContext currently
calls connection.createStatement() without ensuring the Statement is closed;
change the try/catch that obtains CalciteServerStatement to a try-with-resources
that declares a Statement (e.g., try (Statement stmt =
connection.createStatement()) { CalciteServerStatement statement =
stmt.unwrap(CalciteServerStatement.class); ... } ), move the validator creation
into that try block so the Statement is closed automatically, and remove the old
catch/throw pattern while preserving the existing IllegalStateException and
validator initialization logic.
🧹 Nitpick comments (5)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java (1)

458-465: Add verifySchema() for consistency with other BETWEEN tests.

All other BETWEEN tests in this file (e.g., testBetween, testBetweenWithDifferentTypes) verify the schema before checking data rows. This test should follow the same pattern for consistency.

♻️ Proposed fix
 @Test
 public void testBetweenWithMixedTypes() throws IOException {
   JSONObject actual =
       executeQuery(
           String.format(
               "source=%s | where age between '35' and 38 | fields firstname, age",
               TEST_INDEX_BANK));
+  verifySchema(actual, schema("firstname", "string"), schema("age", "int"));
   verifyDataRows(actual, rows("Hattie", 36), rows("Elinor", 36));
 }
docs/user/ppl/interfaces/endpoint.md (1)

122-124: Clarify the different constant representations across plan formats.

The same literal constant 30 appears in three different forms across the extended format output:

  • Logical plan: SAFE_CAST(30:BIGINT) (line 122)
  • Physical plan: 30:BIGINT (line 123)
  • Extended Java code: 30L (line 124)

While these representations serve different purposes (logical abstraction, physical optimization, runtime code), users may find the variations confusing without explanation.

Consider adding a brief explanation in the documentation about how the same operation is represented differently across plan stages, or reference Calcite's documentation on plan representations.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

1810-1818: Note the TODO comment regarding pushdown behavior.

The TODO on line 1810 indicates that the chart with useother=true scenario currently doesn't achieve pushdown as expected. This is flagged for future investigation.

Consider creating a tracking issue for this if one doesn't already exist, to ensure it's addressed in a follow-up.

Would you like me to help create a GitHub issue to track this pushdown optimization opportunity?

core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)

283-341: Well-structured validation pipeline with comprehensive JavaDoc.

The validate method correctly implements the RelNode → SqlNode → validate → RelNode round-trip as per PR objectives. A few observations:

  1. Line 311: Objects.requireNonNull(rewritten) will throw NPE if SqlRewriteShuttle returns null. Consider whether a more descriptive exception would aid debugging.

  2. Line 314: Silent fallback for tolerant validation exceptions may make debugging difficult. Consider adding debug-level logging when falling back.

  3. The method is at the ~50-line threshold per coding guidelines. The current organization is logical, but if it grows further, consider extracting the SqlToRelConverter configuration into a helper.

💡 Optional: Add debug logging for validation fallback
     } catch (CalciteContextException e) {
       if (ValidationUtils.tolerantValidationException(e)) {
+        log.debug("Tolerant validation fallback triggered: {}", e.getMessage());
         return relNode;
       }
       throw new ExpressionEvaluationException(e.getMessage(), e);
     }
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)

84-93: Static operatorTableProvider requires careful initialization ordering.

The static operatorTableProvider must be set before any CalcitePlanContext calls getValidator(). The IllegalStateException at line 141-142 guards against this, but consider documenting the initialization requirement in the class JavaDoc.

📝 Optional: Add class-level documentation for initialization order
/**
 * ...existing class doc...
 * 
 * <p><b>Initialization:</b> {@link #setOperatorTableProvider(SqlOperatorTableProvider)} 
 * must be called during module initialization before any context creates a validator.
 */
public class CalcitePlanContext {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff91bdc and a6839e2.

📒 Files selected for processing (76)
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/access_struct_subfield_with_item.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_no_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_max.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_value_syntax.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
✅ Files skipped from review due to trivial changes (1)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
🚧 Files skipped from review as they are similar to previous changes (12)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_value_syntax.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_no_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_max.yaml
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • integ-test/src/test/resources/expectedOutput/calcite/access_struct_subfield_with_item.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
🧰 Additional context used
📓 Path-based instructions (8)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read

  • Suggest extracting complex logic into helper methods
  • Check for code organization and logical grouping
  • Validate all RelNode types are handled

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yaml
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-29T05:32:11.893Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project

Applied to files:

  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧬 Code graph analysis (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java (1)
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)
  • MatcherUtils (43-502)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (9)
core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java (1)
  • OpenSearchSparkSqlDialect (28-113)
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplRelToSqlNodeConverter.java (1)
  • PplRelToSqlNodeConverter (36-172)
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java (1)
  • PplSqlToRelConverter (24-92)
core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java (1)
  • PplRelToSqlRelShuttle (28-83)
core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java (1)
  • SkipRelValidationShuttle (85-175)
core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java (1)
  • SqlRewriteShuttle (21-69)
common/src/main/java/org/opensearch/sql/common/utils/QueryContext.java (1)
  • QueryContext (18-87)
core/src/main/java/org/opensearch/sql/exception/CalciteUnsupportedException.java (1)
  • CalciteUnsupportedException (8-17)
core/src/main/java/org/opensearch/sql/exception/ExpressionEvaluationException.java (1)
  • ExpressionEvaluationException (9-18)

Comment on lines +129 to +168
public SqlValidator getValidator() {
if (validator == null) {
synchronized (this) {
// Double-Checked Locking for thread-safety
if (validator == null) {
final CalciteServerStatement statement;
try {
statement = connection.createStatement().unwrap(CalciteServerStatement.class);
} catch (SQLException e) {
throw new RuntimeException(e);
}
if (operatorTableProvider == null) {
throw new IllegalStateException(
"SqlOperatorTableProvider must be set before creating CalcitePlanContext");
}
SqlValidator.Config validatorConfig =
SqlValidator.Config.DEFAULT
.withTypeCoercionRules(PplTypeCoercionRule.instance())
.withTypeCoercionFactory(PplTypeCoercion::create)
// Use lenient conformance for PPL compatibility
.withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance())
// Use Spark SQL's NULL collation (NULLs sorted LOW/FIRST)
.withDefaultNullCollation(NullCollation.LOW)
// This ensures that coerced arguments are replaced with cast version in sql
// select list because coercion is performed during select list expansion during
// sql validation. Affects 4356.yml
// See SqlValidatorImpl#validateSelectList and AggConverter#translateAgg
.withIdentifierExpansion(true);
validator =
PplValidator.create(
statement,
config,
operatorTableProvider.getOperatorTable(),
TYPE_FACTORY,
validatorConfig);
}
}
}
return validator;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "CalciteServerStatement" --type java -B 2 -A 2

Repository: opensearch-project/sql

Length of output: 5211


🏁 Script executed:

# Check if CalciteServerStatement is defined in this codebase or imported from Calcite
rg -n "class CalciteServerStatement" --type java

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check for try-with-resources usage with statements in similar code
rg -n "createStatement()" --type java -B 2 -A 2 | head -50

Repository: opensearch-project/sql

Length of output: 4675


🏁 Script executed:

# Look at the imports and see where CalciteServerStatement comes from
head -50 core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java

Repository: opensearch-project/sql

Length of output: 2005


🏁 Script executed:

# Search for close() calls on statement variables
rg -n "statement\.close()" --type java

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check CalciteToolsHelper usage after statement creation
sed -n '160,180p' core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java

Repository: opensearch-project/sql

Length of output: 891


🏁 Script executed:

# Check how validator is used after it's created - is it long-lived?
sed -n '129,200p' core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java

Repository: opensearch-project/sql

Length of output: 2766


🏁 Script executed:

# Look at full method context of CalciteToolsHelper where statement is created
sed -n '140,180p' core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java

Repository: opensearch-project/sql

Length of output: 1652


Wrap connection.createStatement() in try-with-resources for proper resource cleanup.

The CalciteServerStatement created on line 136 is never explicitly closed. While this pattern is used consistently elsewhere in the codebase (e.g., CalciteToolsHelper.java:168), the coding guidelines require using try-with-resources for proper resource cleanup. Either use try-with-resources or document why this Calcite framework object does not require explicit closing.

Example fix:

final CalciteServerStatement statement;
try (Statement stmt = connection.createStatement()) {
  statement = stmt.unwrap(CalciteServerStatement.class);
  // rest of initialization
}
🤖 Prompt for AI Agents
In @core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java around
lines 129 - 168, The getValidator method in CalcitePlanContext currently calls
connection.createStatement() without ensuring the Statement is closed; change
the try/catch that obtains CalciteServerStatement to a try-with-resources that
declares a Statement (e.g., try (Statement stmt = connection.createStatement())
{ CalciteServerStatement statement = stmt.unwrap(CalciteServerStatement.class);
... } ), move the validator creation into that try block so the Statement is
closed automatically, and remove the old catch/throw pattern while preserving
the existing IllegalStateException and validator initialization logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/user/ppl/functions/collection.md (1)

165-201: Two TRANSFORM examples are appropriately disabled pending issue #4972, but note that MVMAP has the same pattern with three similarly disabled examples.

The TRANSFORM function examples at lines 167–172 and 185–190 are marked with ppl ignore and reference issue #4972. This is a consistent pattern—other lambda-based functions like FORALL, EXISTS, FILTER, and REDUCE retain working examples, suggesting the issue specifically affects TRANSFORM and MVMAP transformations. Verify that issue #4972 adequately captures all affected examples (TRANSFORM: 2 examples; MVMAP: 3 examples at lines 819–824, 837–842, 859–864) to ensure comprehensive tracking.

🧹 Nitpick comments (1)
docs/user/ppl/functions/collection.md (1)

818-875: MVMAP examples disabled with same pattern as TRANSFORM.

Three MVMAP examples are disabled, consistent with the TRANSFORM failures. Both functions involve array element transformations (lambdas for TRANSFORM, expressions for MVMAP).

Please verify that issue #4972 explicitly covers both TRANSFORM and MVMAP failures, as the root cause may be related to how the new Calcite validation handles array element-level operations.

Consider consolidating the TODO comments or creating a more descriptive issue title that clearly indicates "lambda and array transformation validation" to help future maintainers understand the scope.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6839e2 and 906d999.

📒 Files selected for processing (4)
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/patterns.md
  • docs/user/ppl/functions/collection.md
  • docs/user/ppl/functions/conversion.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/user/ppl/cmd/patterns.md
  • docs/user/ppl/functions/conversion.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (1)
docs/user/ppl/cmd/bin.md (1)

501-508: The example correctly uses the ppl ignore directive, which is the documented approach for marking code blocks that should be skipped during testing. This pattern is consistently used throughout the codebase (in limitations.md, functions/collection.md, and others) to handle incomplete or broken examples—paired with HTML TODO comments explaining the issue. No changes are needed; the implementation follows established conventions.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java (1)

1014-1017: Test assertion doesn't match documented intent.

The test comment (lines 1004-1008) states it should verify "the error message clearly explains the limitation and suggests solutions," but the assertion now checks for an internal Calcite error (resolving method 'minus[...]') that provides no user-friendly guidance.

The TODO referencing issue #4973 acknowledges this is a temporary state. When addressing that issue, consider updating the error handling to surface a user-friendly message explaining that bins on timestamp fields requires pushdown to be enabled, rather than exposing internal Calcite method resolution failures.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 906d999 and 38ef160.

📒 Files selected for processing (6)
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/patterns.md
  • docs/user/ppl/functions/collection.md
  • docs/user/ppl/functions/conversion.md
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/patterns.md
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • docs/user/ppl/functions/conversion.md
  • docs/user/ppl/functions/collection.md
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java (1)

1004-1017: Test assertion doesn't match stated verification intent.

The comment states "Verify the error message clearly explains the limitation and suggests solutions" and the assertion message expects "clear error message about bins parameter requirements," but the actual check validates an internal Calcite error (resolving method 'minus[class java.lang.String, class java.lang.String]'). This cryptic message won't help users understand that they need to enable pushdown for timestamp binning.

The TODO referencing #4973 acknowledges this, but consider updating the assertion message (lines 1011-1013) to reflect what's actually being checked, so the test failure message is accurate if the internal error changes:

     assertTrue(
-        "Expected clear error message about bins parameter requirements on timestamp fields, but"
-            + " got: "
+        "Expected Calcite method resolution error (tracked by #4973 for proper user message), but"
+            + " got: "
             + errorMessage,
🤖 Fix all issues with AI agents
In @docs/user/ppl/cmd/bin.md:
- Around line 501-502: Example 20 in the docs contains a non-functional ppl code
block marked only with an HTML TODO comment; either remove the entire Example 20
block until issue #4973 is resolved or make the limitation obvious to users by
replacing the HTML comment with a visible admonition (warning note) immediately
before the ```ppl ignore``` block that states the feature is not supported and
references issue #4973; ensure the visible note uses the same documentation
styling (admonition/warning) as other docs so users see it when rendered.
🧹 Nitpick comments (5)
integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml (1)

1-1: Consider removing the leading blank line.

YAML files typically don't start with a blank line. While this won't affect parsing, it's inconsistent with standard YAML formatting conventions.

Suggested fix
-
 calcite:
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml (1)

12-15: Physical plan demonstrates proper pushdown optimization.

The expected physical plan correctly shows:

  • Dual-level EnumerableLimit for system limit (10000) and user pagination (offset 100, fetch 10)
  • Aggregation pushed to OpenSearch with multi_terms sized at 110 to satisfy offset + fetch
  • Boolean query structure combining term, range, script, and exists filters

One observation: the embedded base64-encoded scripts make it difficult to verify the serialized expressions are correct if they need future updates. Consider adding a comment in the test or documentation explaining how to decode/encode these blocks for maintenance purposes.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)

1-1: Minor: Leading empty line in YAML file.

The file begins with an empty line. While this doesn't affect parsing, it's unconventional for YAML files. Consider removing for consistency with other expected output files, or ignore if this is a generated artifact.

integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml (1)

1-1: Unnecessary leading empty line.

The empty line at the start of the file is not needed and could be removed for cleaner formatting.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)

1-1: Minor: Spurious blank line at file start.

An empty line at the beginning of the YAML file is unnecessary and could be removed for cleaner formatting, though it won't affect functionality.

🧹 Suggested fix
-
 calcite:
   logical: |
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 906d999 and 9ecc8ff.

📒 Files selected for processing (45)
  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/patterns.md
  • docs/user/ppl/functions/collection.md
  • docs/user/ppl/functions/conversion.md
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_numeric.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_then_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_filter_5_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_dedup_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
✅ Files skipped from review due to trivial changes (3)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_dedup_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter.yaml
🚧 Files skipped from review as they are similar to previous changes (22)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yaml
  • docs/user/ppl/cmd/patterns.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml
  • docs/user/ppl/functions/conversion.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having2.yaml
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_then_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_filter_5_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_numeric.yaml
🧰 Additional context used
📓 Path-based instructions (6)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
📚 Learning: 2025-12-29T05:32:11.893Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (31)
integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml (2)

7-7: LGTM! Explicit NULLS FIRST aligns with PR objectives.

The addition of NULLS FIRST to the window function correctly reflects the PR's goal of preserving null direction during RelNode ↔ SqlNode round trips. This makes the null ordering behavior explicit in the explain plan output, which is consistent with Calcite's validation system behavior.

The test data is realistic and representative of the expected Calcite explain output format.


13-13: Physical plan update is consistent with logical plan changes.

The physical plan correctly mirrors the logical plan's null ordering semantics. The OpenSearch aggregation query DSL with "order":[{"_count":"desc"},{"_key":"asc"}] is the appropriate translation for the rare/top operation with the preserved null direction.

integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml (1)

1-23: LGTM! Type annotation simplification aligns with Calcite validation changes.

The removal of explicit :VARCHAR annotations from string literals ('young', 'adult') is consistent with the PR's migration to Calcite's type validation system. The plan semantics remain correct:

  • Boundary conditions are properly handled (age < 30 for 'young', age >= 30 for 'adult' with no overlap).
  • Filter pushdown to OpenSearch index scans is preserved.
  • Aggregation and projection logic is intact.
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml (2)

7-8: LGTM - LogicalProject addition is expected from SqlNode round-trip.

The explicit LogicalProject wrapping the index scan is a consequence of the RelNode → SqlNode → RelNode validation round-trip introduced by this PR. The physical plan shows this doesn't impact optimization effectiveness—the push-down still correctly projects only [age] and applies LIMIT->5 at the scan level.


11-11: LGTM - Explicit BIGINT typing reflects Calcite's type coercion.

The change from 30 to 30:BIGINT is the expected result of Calcite's validation system adding explicit type information to literals during coercion. This ensures type-safe comparison with the age field.

integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml (1)

3-10: LGTM!

The test data is well-structured and covers the expected behavior for the Calcite integration:

  • The logical plan correctly represents the query structure with explicit type-safe range bounds (>=($28, 1), <=($28, 100)).
  • The physical plan properly reflects the typed Sarg representation (Sarg[[1L:BIGINT..100L:BIGINT]]:BIGINT), aligning with the PR's type coercion objectives.
  • The OpenSearch query builder correctly translates to a boolean query with term and range clauses, including boundary values.
  • Edge cases are covered with inclusive range boundaries.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml (1)

1-11: Test data structure looks correct for dedup expression scenario.

The expected output covers:

  • Logical plan: Properly demonstrates dedup via ROW_NUMBER() OVER (PARTITION BY new_gender, new_state) with filter <= 2 to keep top 2 rows per partition.
  • Physical plan: Shows aggregation pushdown to OpenSearch with composite buckets on the partition keys and top_hits to retrieve actual documents.

A few observations:

  1. The physical plan on line 11 is extremely long (single line with Base64-encoded scripts), which is typical for these files but makes diff reviews difficult.
  2. The Base64 payloads encode LOWER function metadata for gender.keyword and state.keyword fields—consistent with the LOWER($4) and LOWER($7) expressions in the logical plan.

Given the PR objectives mention "Investigate dedup optimization failure" as remaining work, this alternative expected output file likely represents an acceptable plan shape during the investigation. Please confirm this test case aligns with the expected behavior after the Calcite validation/coercion changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml (1)

1-11: LGTM - Expected output correctly captures dedup-with-expressions pushdown.

The test data is realistic and representative:

  • Uses typical account index fields (account_number, firstname, balance, gender, state, etc.)
  • Validates the ROW_NUMBER-based dedup logic with keepempty=2 (condition <=($19, 2))
  • Physical plan correctly shows composite aggregation pushdown with LOWER function scripts for new_gender and new_state
  • top_hits size of 2 and missing_bucket=false properly align with the logical plan's dedup semantics and IS NOT NULL filters

The base64-encoded script content is expected for Calcite's serialized script handling.

integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml (2)

1-8: LGTM! Logical plan correctly represents the disjunction query.

The expected output properly captures the OR condition between a term match and a numeric range. The plan structure (SystemLimit → Sort → Project → Filter → IndexScan) is appropriate for this query pattern.

Minor: Line 1 adds an empty line at the start of the file, which is a minor formatting inconsistency but doesn't affect functionality.


9-10: LGTM! Physical plan correctly pushes down the disjunction to OpenSearch.

The pushdown context properly translates the OR condition:

  • Calcite's Sarg[[1L:BIGINT..100L:BIGINT]] representation for the range is appropriate
  • The OpenSearch DSL correctly uses bool.should to handle the disjunction with both term and range clauses
  • Predicate pushdown is successfully demonstrated, which aligns with the PR objectives for Calcite validation integration

This test case effectively covers the edge case of mixed query types (term + range) in a disjunction.

docs/user/ppl/functions/collection.md (2)

165-201: LGTM! Properly documented known issue.

The TRANSFORM examples are consistently marked with TODO comments and ppl ignore blocks, referencing issue #4972 for tracking. The expected outputs are preserved, which will facilitate re-enabling these examples once the issue is resolved.


818-875: LGTM! Consistent handling of MVMAP examples.

All three MVMAP examples are uniformly marked with TODO comments and ppl ignore blocks, referencing the same issue #4972. The approach is consistent with the TRANSFORM examples and preserves all expected outputs for future validation.

integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml (1)

1-10: LGTM! Test data is realistic and covers expected edge cases.

The expected output correctly validates the Calcite validation/coercion pipeline:

  • The logical plan reflects the Sort → Project → Filter ordering consistent with PR objectives.
  • The filter predicate uses compound range semantics (AND(>=($28, 10), <=($28, 20))) and OR disjunction.
  • Field indices in the physical plan ($13metrics.size, $15aws.cloudwatch.log_stream) correctly map to the projected fields.
  • The Sarg[[10L:BIGINT..20L:BIGINT]]:BIGINT notation aligns with explicit type-casting introduced by the validation system.

The numeric bounds appearing as 10.0/20.0 (floats) in the ES query vs 10L:BIGINT in the Sarg is expected behavior for OpenSearch compatibility.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml (2)

7-11: Redundant LogicalProject nodes appear in the plan.

Lines 7 and 9 contain identical LogicalProject nodes projecting all 17 fields. This redundancy likely results from the RelNode → SqlNode → RelNode round-trip introduced by the Calcite validation phase. While functionally correct, this may indicate a missed optimization opportunity.

Given the PR TODOs mention "Investigate dedup optimization failure," this is expected behavior for now and should be addressed as part of that follow-up.

Confirm the redundant projections don't impact query performance in production and are handled by downstream optimization passes.


13-13: BIGINT-typed Sarg representation is correct and consistent.

The physical plan correctly uses Sarg[(30L:BIGINT..40L:BIGINT)]:BIGINT for the age range filter, and the JSON sourceBuilder correctly translates this to:

  • "include_lower":false, "include_upper":false matching the exclusive bounds (30..40)
  • Balance filter >($0, 10000) correctly maps to "include_lower":false

The field indexing after PROJECT->[balance, age] is also correct: $0 = balance, $1 = age.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml (1)

4-13: LGTM! Type coercion changes correctly reflect Calcite validation behavior.

The updated expected output properly captures the new Calcite-based type coercion:

  1. Logical plan (line 5): The explicit CAST(1000):DOUBLE NOT NULL for comparing against new_avg (derived from AVG, which returns DOUBLE) is the expected implicit cast insertion from Calcite's type coercion.

  2. Physical plan (line 12): Typed literals (1000.0E0:DOUBLE, 1:BIGINT) correctly match the operand types of the comparisons.

  3. Consistency: The second comparison >($4, 1) doesn't require a cast in the logical plan since COUNT() returns BIGINT and the literal 1 is compatible (typed as 1:BIGINT in the physical plan).

Test data is realistic, covers the type coercion edge case for numeric comparisons, and aligns with the PR's objective of migrating to Calcite's validation system.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml (1)

1-11: Logical plan structure looks correct.

The expected logical plan covers realistic analytics query patterns including:

  • Pagination with offset/limit and system query size limits
  • NULL handling with IS NOT NULL predicates
  • Complex filter conditions with OR clauses and date range comparisons
  • Aggregation with COUNT() grouped by multiple fields
  • SAFE_CAST usage aligns with the PR's type coercion approach for handling type mismatches
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (2)

3-10: Logical plan structure is correct and consistent.

The dedup plan correctly:

  • Partitions ROW_NUMBER() by $4, $5 (new_gender, new_state) which are the LOWER expression results
  • Filters nulls on partition keys before windowing to ensure correct dedup behavior
  • Maintains sort order for deterministic dedup results

The column references align properly with the projection schema defined in line 9.


11-12: Physical plan pushdown is consistent with logical plan.

The OpenSearchRequestBuilder correctly reflects:

  • Composite aggregation on new_gender/new_state with LOWER scripts matching the logical projection
  • top_hits with size:2 matching the dedup limit condition <=($6, 2)
  • Proper _source includes and script_fields for computed columns

The test data is realistic and representative of Calcite's explain output for dedup with expressions.

integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml (2)

1-10: LGTM! Logical plan structure is consistent and correctly represents dedup semantics.

The column indices are properly aligned:

  • $28 correctly references metrics.size for both the partition key and NULL filter
  • $45 correctly references _row_number_dedup_ for the dedup filter (<= 1)
  • Outer projection indices match the inner projection field positions

The plan correctly handles the edge case of NULL partition keys with IS NOT NULL($28) before windowing.


12-13: Physical plan pushdown correctly implements dedup semantics.

The OpenSearch aggregation structure is appropriate:

  • Composite aggregation with top_hits(size=1) correctly deduplicates by taking first row per metrics.size partition
  • _source includes field list matches the logical projection
  • missing_bucket:false correctly excludes NULL partition keys (aligned with the IS NOT NULL filter in logical plan)
integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (2)

5-9: Field indices correctly reflect added derived field.

The field index shifts ([$14]-[$26] and join condition =($13, $20)) are consistent with the addition of the $f13 derived field in line 7. The CAST(REX_EXTRACT(...)):VARCHAR wrapper aligns with the PR's Calcite validation system that automatically inserts type casts during coercion.


20-21: Physical plan correctly propagates CAST to pushdown context.

The EnumerableCalc and CalciteEnumerableIndexScan correctly incorporate the CAST(REX_EXTRACT(...)):VARCHAR in both the projection and the SORT_EXPR pushdown. This ensures the sort-merge join operates on type-coerced values, consistent with the logical plan.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (2)

6-14: Logical plan structure looks correct.

The plan properly handles:

  • Empty string filtering on Referer (line 13)
  • Domain extraction via REGEXP_REPLACE with pattern ^https?://(?:www\.)?([^/]+)/.*$ (line 12)
  • NULL filtering on computed k field (line 11)
  • Aggregation with AVG, COUNT, MIN (line 9)
  • Post-aggregation filtering on count > 100000 (line 7)

The wide projection at line 12 covering all 111 clickbench fields is expected for the hits table schema.


18-19: Physical plan correctly pushes aggregation to OpenSearch.

The pushdown context appropriately includes:

  • Filter pushdown: <>($0, '') and existence check on Referer
  • Composite aggregation for group-by on computed k (domain extraction)
  • Sub-aggregations: avg with CHAR_LENGTH script, top_hits for min(Referer)

The base64-encoded Calcite scripts enable server-side evaluation of REGEXP_REPLACE and CHAR_LENGTH expressions, which is the expected pattern for complex expression pushdown.

integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml (2)

9-14: LGTM!

The logical plan structure is internally consistent. Column references correctly trace through the plan:

  • Aggregate outputs state@0, city@1, avg_age@2
  • Project reorders to avg_age@0, state@1, age2@2
  • Sort on $1 correctly sorts by state

The test data covers relevant scenarios including aggregation, computed columns (age2=[+($2, 2)]), NULL handling (ASC-nulls-first, IS NOT NULL), and deduplication with window functions.


21-21: LGTM!

The physical plan correctly pushes down the query to OpenSearch:

  • Filter >($2, 30) maps to the range query on age
  • Composite aggregation with state and city buckets matches the logical grouping
  • SORT->[0 ASC FIRST] aligns with "order":"asc" and "missing_order":"first" on the state bucket
  • The avg_age metric aggregation is correctly nested

The expected output accurately represents the Calcite enumerable index scan path with appropriate pushdown context.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (3)

6-14: Logical plan structure looks correct for the Calcite validation changes.

The restructured plan correctly:

  • Reorders projections to place the final column mapping (l=[$0], c=[$1], CounterID=[$2]) after filtering.
  • Applies the filter >($1, 100000) on the aggregated count before final projection.
  • Expands all 111 fields in the intermediate LogicalProject (line 12) to preserve field indices during validation.
  • Maintains consistent field index references: $103 for CounterID and $26 for URL.

The plan flow (scan → filter non-empty URL → project → filter non-null CounterID → project subset → aggregate → reproject → filter count → final project) is logically sound.


18-18: Explicit BIGINT type annotation reflects Calcite type coercion.

The change from an implicit integer literal to 100000:BIGINT is expected behavior from the Calcite validation system, which now performs explicit type annotation during plan generation. This aligns with the PR's objective of migrating type checking/coercion to Calcite's validation system.


19-19: Script payload is valid and properly encoded.

The Base64-encoded script decodes successfully to a valid JSON structure that correctly represents the CHAR_LENGTH operation with a dynamicParam: 0 reference. The DIGESTS array ["URL"] properly maps to this dynamic parameter at index 0, confirming the field binding is correct.

docs/user/ppl/cmd/bin.md (1)

503-508: No changes required. The example is already correctly marked as not-yet-implemented with an explicit TODO comment (<!-- TODO: Enable after fixing https://github.com/opensearch-project/sql/issues/4973 -->) and the ignore flag on the code block. This is the proper way to document planned features that are not yet functional.

@yuancu
Copy link
Collaborator Author

yuancu commented Jan 13, 2026

Curious how much overhead will this validation process introduce?

I have updated a profiling result in the PR description. It's profiled with this script. Based on the result, the validation layer brings no significant performance degration.

@yuancu
Copy link
Collaborator Author

yuancu commented Jan 13, 2026

Hi @ykmr1224 , do you have any concern over this PR? I foresee that it will bring some impact to the schema-on-read solution, be it in a helpful or negative way. I'm worrying that this PR will make your solution difficult to proceed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)

184-197: Unused String.format argument.

The format string has only two placeholders (%s) but three arguments are provided. The third TEST_INDEX_WORK_INFORMATION argument at line 196 is never used.

🔧 Suggested fix
     assertYamlEqualsIgnoreId(
         expected,
         explainQueryYaml(
             String.format(
                 "source = %s"
                     + "| eval count_dept = ["
                     + "    source = %s | stats count(name)"
                     + "  ]"
                     + "| fields name, count_dept",
-                TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION, TEST_INDEX_WORK_INFORMATION)));
+                TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION)));

199-212: Unused String.format argument.

Same issue as above - the format string has only two %s placeholders but three arguments are provided at line 211.

🔧 Suggested fix
     assertYamlEqualsIgnoreId(
         expected,
         explainQueryYaml(
             String.format(
                 "source = %s"
                     + "| where id > ["
                     + "    source = %s | stats count(name)"
                     + "  ] + 999"
                     + "| fields name",
-                TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION, TEST_INDEX_WORK_INFORMATION)));
+                TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION)));

2016-2037: Missing @Test annotation - test will not execute.

The testComplexDedup method is missing the @Test annotation, so it will not be executed as part of the test suite.

🔧 Suggested fix
+  @Test
   public void testComplexDedup() throws IOException {
     enabledOnlyWhenPushdownIsEnabled();
🧹 Nitpick comments (4)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml (1)

11-12: Physical plan correctly pushes down aggregation and scripted fields.

The physical plan appropriately translates to OpenSearch:

  • Composite aggregation with top_hits size=2 matching the dedup count
  • Script fields for computed columns with correct DIGESTS referencing source fields (gender.keyword, state.keyword)
  • Source includes/excludes properly configured

The single-line format for the physical plan JSON makes diff reviews challenging. Consider using YAML block scalar with literal style and formatting the JSON across multiple lines for improved readability and easier change tracking in future PRs—though this is a low-priority suggestion if exact-match testing requires this format.

integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)

469-500: LGTM - well-structured overload following existing patterns.

The implementation correctly:

  • Validates non-empty input with a clear exception message
  • Short-circuits on first successful match
  • Propagates the last failure for diagnostic purposes
  • Has comprehensive JavaDoc

The pattern is consistent with the existing two-argument overload at lines 460-467.

🔧 Optional: Add null validation for defensive programming

For better error messages when debugging test failures, consider adding a null check:

 public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) {
+  if (expectedYamls == null) {
+    throw new IllegalArgumentException("expectedYamls must not be null");
+  }
   if (expectedYamls.isEmpty()) {
     throw new IllegalArgumentException("expectedYamls list must not be empty");
   }

This is optional since the existing methods in this file don't perform extensive null checks either, and test utility callers are expected to provide valid inputs.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2)

1811-1818: TODO comment indicates a known regression.

The comment "TODO: Fix -- now it gets away from push-down" suggests that push-down optimization is not working as expected for this case. Consider creating an issue to track this.

Would you like me to open a GitHub issue to track this push-down optimization regression for the chart command with useother=true?


2066-2074: Non-deterministic plan output handling.

The use of List.of() with multiple expected plan files (explain_dedup_expr4.yaml, alternatives) handles legitimate optimizer non-determinism. This is an acceptable pattern, but consider documenting why multiple valid outputs exist (e.g., different but equivalent join orderings, filter placements).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ecc8ff and 35ac017.

📒 Files selected for processing (8)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java
🧰 Additional context used
📓 Path-based instructions (6)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (9)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml (1)

1-12: LGTM! Well-structured expected output for dedup-with-expressions scenario.

The logical plan correctly models:

  • ROW_NUMBER windowing partitioned by computed expressions ($4, $5 = new_gender, new_state from LOWER)
  • Null filtering on partition keys before dedup
  • Sort order preservation (ASC-nulls-first, DESC-nulls-last)

The physical plan appropriately shows:

  • Composite aggregation with scripted fields for the computed partition keys
  • top_hits with size=2 matching the <= 2 dedup filter
  • Base64-encoded calcite scripts for OpenSearch execution

This alternative1 variant correctly demonstrates dedup partitioning on expression-derived columns.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml (1)

1-12: LGTM! Correctly represents an alternative dedup optimization path.

This alternative2 variant differs from alternative1 in a meaningful way:

  • Logical plan: PARTITION BY $1, $3 (original gender, state fields) vs alternative1's $4, $5 (computed new_gender, new_state)
  • Physical plan: Uses direct field terms ("field":"gender.keyword") instead of scripted aggregation sources

The distinction correctly demonstrates that the optimizer may choose to partition on either the source fields or the computed expressions depending on the query context. Both the logical structure and OpenSearch request payload are consistent with this optimization choice.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml (1)

1-10: Logical plan structure is correct and well-formed.

The logical plan correctly represents the dedup + rename transformation:

  • ROW_NUMBER() OVER (PARTITION BY $4, $5) for deduplication partitioning
  • <=($6, 2) filter for allowedDuplicates=2
  • LOWER($4) and LOWER($7) for the rename expressions
  • Proper null handling with IS NOT NULL filters
  • Sort directions preserved (ASC-nulls-first, DESC-nulls-last)
integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml (2)

1-5: LGTM - Logical plan structure is correct.

The logical plan correctly represents a count aggregation scenario:

  • LogicalSystemLimit enforces the query size limit
  • LogicalAggregate(group=[{}], cnt=[COUNT()]) represents a global COUNT aggregation
  • CalciteLogicalIndexScan on the standard test index

The hierarchy and notation follow Calcite conventions.


6-7: Physical plan correctly represents aggregation pushdown.

The OpenSearchRequestBuilder configuration is appropriate for a count aggregation:

  • size=0 correctly indicates no documents should be fetched (aggregation-only)
  • track_total_hits=2147483647 (Integer.MAX_VALUE) ensures accurate counting
  • AGGREGATION and LIMIT pushdowns are properly captured in PushDownContext

The rel#:LogicalAggregate.NONE.[] pattern appears to be ID-normalized for stable test comparisons (consistent with assertYamlEqualsIgnoreId usage mentioned in the PR).

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml (1)

6-12: LGTM! Column references are now consistent across the plan.

The updated expected output correctly reflects the fix where dedup partitioning and null checks now reference $1, $3 (gender, state) instead of $4, $5 (new_gender, new_state). This aligns the PARTITION BY columns with the sort columns used in both LogicalSystemLimit (line 3) and LogicalSort (line 8), and the physical plan's composite buckets on gender.keyword and state.keyword are consistent with this change.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)

32-49: LGTM!

The test class properly extends ExplainIT, enables Calcite mode, and loads the necessary test indices. The structure follows established integration test patterns.


444-482: LGTM!

The timechart tests verify the correct calculation formulas for per_second, per_minute, per_hour, and per_day functions. The expressions correctly include the time unit conversions and TIMESTAMPDIFF calculations.


1306-1331: LGTM - proper resource cleanup pattern.

Good use of try-finally blocks to ensure resetQueryBucketSize() is called even if assertions fail. This pattern is consistently applied throughout the test class for test isolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Type Conversion - Leverage RelDataType from Calcite in V3 engine. [RFC] Implicit Type Coercion with Calcite

3 participants